12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(19) World Intellectual Property Organization 

International Bureau 

(43) International Publication Date 
3 May 2001 (03.05.2001) 




PCT 



(10) International Publication Number 

WO 01/31640 Al 



(51) International Patent Classification 7 : G10L. 21/02 

(21) International Application Number: PCT/EP0O/ 1 07 1 3 

(22) International Filing Date: 27 October 2000 (27. 1 0.2000) 

(25) Filing Language: English 

(26) Publication Language: English 

(30) Priority Data: 

99203565.9 29 October 1 999 (29. 1 0. 1 999) EP 

(71) Applicant (for all designated States except US): KONIN- 
KLUKE PHILIPS ELECTRONICS N.V. [NL/NL]; 
Groenewoudseweg 1, NL-5621 BA Eindhoven (NL). 

= (72) Inventor; and 

I (75) Inventor/Applicant (for US only)'. HUANG, Chao-Shih, 



J. [CN/NLj: Prof. Holstlaan 6. NL-5656 AA Eindhoven 
(NL). 

(74) Agent: HOEKSTRA, Jelle; Internalionaal Octrooibureau 
B.V., Prof. Holstlaan 6. NL-5656 AA Eindhoven (NL). 

(81) Designated States (national): JR US. 

(84) Designated States (regional): European patent (AT, BE, 
CH. CY, DE, DK, ES. FL FR, GB, GR, IE, IT, LU, MC, 
NL, PT, SE). 

Published: 

— With international search report, 

— Before the expiration of the time limit for amending the 
claims and to be republished in the event of receipt of 
amendments. 

For two-letter codes and other abbreviations, refer to the "Guid- 
ance Notes on Codes and Abbreviations" appearing at the begin- 
ning of each regular issue of the PCT Gazette. 



O 



(54) Title: ELIMINATION OF NOISE FROM A SPEECH SIGNAL 

(57) Abstract: A method for reducing noise in a noisy time-varying speech input signal y includes receiving the input signal y 
and deriving a plurality of spectral component signals representing respective magnitudes |> r (*)l of spectral components of the input 
signal v. A correlation coefficient y sn is obtained which indicates a correlation in the spectral domain between a clean speech signal 
component s and a noise signal component n present in the input signal y(y = s + n). Magnitudes of respective noise- suppressed 
spectral components S(ifc) are estimated by solving a correlation equation which gives a relationship between the magnitudes of the 
respective spectral components |}'(*)| of the noisy input signal \% the spectral components \S(k)\ of the clean speech signal s. and 
the spectral components \N(b)\ of the noise signal n, where the equation includes the correlation based on the obtained correlation 
coefficient 7 sn . Preferably, the correlation equation is given by \Y(k)\* = \S(k)\* + \N(k)\* + yJS(*)||AW|. 
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The invention relates to a method for reducing noise in a noisy time-varying 
input signal, such as a speech signal. The invention further relates to an apparatus for reducing 
noise in a noisy time-varying input signal. 

The presence of noise in a time-varying input signal hinders the accuracy and 
quality of processing the signal. This is particularly the case for processing a speech signal, 
such as for instance occurs when a speech signal is encoded. The presence of noise is even 
more destructive if the signal is ultimately not presented to a user, who can relatively well 
cope with the presence of noise, but if the signal is ultimately processed automatically, as for 
instance is the case with a speech signal that is recognized automatically. Increasingly 
automatic speech recognition and coding systems are used. Although the performance of such 
systems is continuously improving, it is desired that the accuracy be increased further, 
particularly in adverse environments, such as having a low signal-to-noise ratio (SNR) or a 
low bandwidth signal. Normally, speech recognition systems compare a representation of an 
input speech signal against a model A x of reference signals, such as hidden Markov models 
(HMMs) built from representations of a training speech signal. The representations are usually 
observation vectors with LPC or cepstral components. 

In practice a mismatch exists between the conditions under which the reference 
signals (and thus the models) were obtained and the input signal conditions. The reference 
signals are usually relatively clean (high SNR, high bandwidth), whereas the input signal 
during actual use is distorted (lower SNR, and/or lower bandwidth). It is, therefore, desired to 
eliminate at least part of the noise present in the input signal in order to obtain a noise- 
suppressed signal. 

A conventional way of estimating a noise-suppressed speech signal ('clean' 
speech) is to use a spectral subtraction technique. In the discrete-time domain, noise speech y 
can be represented as: 

y(i) = s(i) + n(i), 0 <i<T-l, (1) 
where s, n, y denote clean speech, noise and noisy speech respectively, and where T denotes 
the length of the speech and i is the time index. The conventional spectral subtraction 
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technique involves determining the spectral components of the noisy-speech and estimating 
the spectral components of the noise. The spectral components may, for instance, be calculated 
using a Fast Fourier transform (FFT). The noise spectral components may be estimated once 
from a part of a signal with predominantly representative noise. Preferably, the noise is 
estimated 'on-the-fly' , for instance each time a 'silent' part is detected in the input signal with 
no significant amount of speech signal. In the general spectral subtraction technique, the noise- 
suppressed speech is estimated by subtracting an average noise spectrum from the noisy 
speech spectrum: 

5(w;m) = \Y( W ;mX -\N(w;m} a f e J ^' m) (2) 
where S(w;m), Y(w,m), and N(w;m) are the magnitude spectrums of the estimated speech s, 
noisy speech y and noise n, w and m are the frequency and time indices, respectively. The 
case of a = 2 is referred to as power spectral subtraction. The subtraction is usually called 
magnitude spectral subtraction if a = 1. 

Due to the subtraction, the estimated spectrum is not guaranteed to be positive 
in the conventional spectral subtraction techniques. US 5,749,068 describes setting those 
spectral components to zero for which the subtraction yields a negative outcome: 

S{w) = max{y(w) - a.N(w\6\ (3) 
Setting the spectral components to zero (or a low default value) is referred to as 'taking floor' 
for the negative spectral components. The parameter ct, with a positive value, designates the 
degree of eliminating noise components. US 5,749,068 describes an advanced way of 
estimating the spectral components of the noise, but still the conventional spectral subtraction 
of equation (3) is used. 

Taking floor for negative spectral components provides a major limitation of 
spectral subtraction techniques, introducing residual noise with musical tone artifacts into the 
estimated speech. 

In order to investigate the limitation of the conventional spectrum subtraction 
techniques, the inventor has carried out an experiment for calculating the ratio of negative 
spectrum (i.e. the relative number of spectral components which would have a negative value). 
The negative spectrum ratio NSR CO n for the conventional spectral subtraction technique is 
defined as follows: 
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(4) 



Jt=0 

1 jc<0, 



(5) 



10 otherwise. 

where \Y(k)\ is the corresponding magnitude spectrum of the testing speech y, is the 

noise spectrum estimated from a pause (non-speech segment), k denotes the k-th spectrum 
component and M represents the total number of spectral components over which the ratio is 
determined, for instance the number of spectral components in one frame or in the whole 
testing utterance. 

The following table gives the negative spectrum ratio NSR CO n for various signal 
to noise ratios (SNRs) with a =2. It has been found that the negative spectrum ratio NSR €on 
even reaches 34.6% at clean conditions. This illustrates that particularly at higher SNR level 
the conventional spectral subtraction technique introduces some residual noise, limiting the 
use of the technique. 



SNR (dB) 


Negative Spectrum Ratio (NSRcon) (%) 


Clean 


34.6 


40 


22.4 


35 


18.7 


30 


14.6 


25 


10.7 


20 


7.3 


15 


4.5 


10 


2.4 


5 


1.0 


0 


0.2 



It is an object of the invention to overcome the limitation of the conventional 
spectral subtraction technique. 

To meet the object of the invention, the method for reducing noise in a noisy 
time- varying input signal y, such as a speech signal, includes: 

receiving the noisy time-varying input signal; 
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deriving from the signal a plurality of spectral component signals representing 

respective magnitudes of spectral components of the input signal; 

obtaining a correlation coefficient y sn indicative of a correlation in the spectral 

domain between a clean speech signal component s and a noise signal component n present in 

the input signal (y = s + n); and 

estimating magnitudes of respective noise-suppressed spectral components 

S(k) by solving an equation giving a relationship between the magnitudes of the respective 
spectral components \ Y(k)\ of the noisy input signal y, the spectral components \S(k)\ of the 
clean speech signal s, and the spectral components \N(k)\ of the noise signal n, where the 
equation includes the correlation based on the obtained correlation coefficient y sn - Preferably, 
the correlation equation is given by: 

\Y(kf =\S(k)\ a +\N(k)\ a +Y sn \S(k)\\N(k)\ 

where a could be 1 or 2 for magnitude and power spectrum, respectively. Instead of a 
conventional spectral subtraction this equation is solved which is based on a correlation 
coefficient y sn between the clean speech s and the noise n in the spectral domain. Solving the 
equation can be seen as 'correlated spectral subtraction' (CSS). 

The correlation coefficient y sn may be fixed, for instance based on analyzing 
representative input signals. Preferably, the correlation coefficient y 5n is estimated based on the 
actual input signal. Advantageously, the estimation is based on minimizing a negative 
spectrum ratio. Preferably, the expected negative spectrum ratio R is defined as: 

R = E{fns} = jf^fJ\y(kf -|*(*f -y„|i(*)|iv(«n 

where advantageously the 'zero-one' function f ns is given by the differentiate function: 
/B, ^ = l + exp(-a-x + i5) 

By applying the theory of adaptive learning algorithm, the correlation coefficient is 
advantageously obtained by following gradient operation. 

(m+l) (m) 
f sn i sn 

The correlation coefficient can be learned along the direction of NSR decrement. Preferably, 
this is done in an iterative algorithm. 
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The equation representing the correlated spectral subtraction may be solved 
directly. Preferably, the equation is solved in an iterative manner, improving the estimate of 
the clean speech. 



These and other aspects of the invention will be apparent from and elucidated 
with reference to the embodiments shown in the drawings. 

The figure shows a block diagram of a conventional speech processing system 
wherein the invention can be used. 

General description of a speech recognition system 

The noise reduction according to the invention is particularly useful for 
processing noisy speech signals, such as coding such a signal or automatically recognizing 
such a signal. Here a general description of a speech recognition system is given. A person 
skilled in the art can equally well apply the noise elimination technique in a speech coding 
system. 

Speech recognition systems, such as large vocabulary continuous speech 
recognition systems, typically use a collection of recognition models to recognize an input 
pattern. For instance, an acoustic model and a vocabulary may be used to recognize words and 
a language model may be used to improve the basic recognition result. The figure illustrates a 
typical structure of a large vocabulary continuous speech recognition system 100. The 
following definitions are used for describing the system and recognition method: 



A x : a set of trained speech models 

X: the original speech which matches the model, A x 

Y: the testing speech 

A y \ the matched models for testing environment 

W: the word sequence 

S: the decoded sequences that can be words, syllables, sub-word units, states or 

mixture components, or other suitable representations. 
The system 100 comprises a spectral analysis subsystem 110 and a unit 



matching subsystem 120. In the spectral analysis subsystem 110 the speech input signal (SIS) 
is spectrally and/or temporally analyzed to calculate a representative vector of features 
(observation vector, OV). Typically, the speech signal is digitized (e.g. sampled at a rate of 
6.67 kHz.) and pre-processed, for instance by applying pre-emphasis. Consecutive samples are 
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grouped (blocked) into frames, corresponding to, for instance, 32 msec, of speech signal. 
Successive frames partially overlap, for instance, 16 msec. Often the Linear Predictive Coding 
(LPC) spectral analysis method is used to calculate for each frame a representative vector of 
features (observation vector). The feature vector may, for instance, have 24, 32 or 63 
components. The standard approach to large vocabulary continuous speech recognition is to 
assume a probabilistic model of speech production, whereby a specified word sequence 
W = wiW 2 W3...w q produces a sequence of acoustic observation vectors Y = y^ys—yr- The 
recognition error can be statistically minimized by determining the sequence of words 
wiw 2 w 3 ...w q which most probably caused the observed sequence of observation vectors 
yiY2y3— yr (over time t=l,..., T), where the observation vectors are the outcome of the spectral 
analysis subsystem 110. This results in determining the maximum a posteriori probability: 

max P(W|Y, A x ), for all possible word sequences W 
By applying Bayes' theorem on conditional probabilities, P(W|Y, A x ) is given by: 

Since P(Y) is independent of W, the most probable word sequence is given by: 

W= argmaxP(7,W|A x )= arg max P{Y\W,A X ).P(W ) (a) 

In the unit matching subsystem 120, an acoustic model provides the first term 
of equation (a). The acoustic model is used to estimate the probability P(Y|W) of a sequence 
of observation vectors Y for a given word string W. For a large vocabulary system, this is 
usually performed by matching the observation vectors against an inventory of speech 
recognition units. A speech recognition unit is represented by a sequence of acoustic 
references. Various forms of speech recognition units may be used. As an example, a whole 
word or even a group of words may be represented by one speech recognition unit. A word 
model (WM) provides for each word of a given vocabulary a transcription in a sequence of 
acoustic references. In most small vocabulary speech recognition systems, a whole word is 
represented by a speech recognition unit, in which case a direct relationship exists between the 
word model and the speech recognition unit. In other small vocabulary systems, for instance 
used for recognizing a relatively large number of words (e.g. several hundreds), or in large 
vocabulary systems, use can be made of linguistically based sub-word units, such as phones, 
diphones or syllables, as well as derivative units, such as fenenes and fenones. For such 
systems, a word model is given by a lexicon 134, describing the sequence of sub- word units 
relating to a word of the vocabulary, and the sub- word models 132, describing sequences of 
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acoustic references of the involved speech recognition unit. A word model composer 136 
composes the word model based on the sub-word model 132 and the lexicon 134. The 
(sub-)word models are typically based in Hidden Markov Models (HMMs), which are widely 
used to stochastically model speech signals. Using such an approach, each recognition unit 
(word model or sub-word model) is typically characterized by an HMM, whose parameters are 
estimated from a training set of data. For large vocabulary speech recognition systems usually 
a limited set of, for instance 40, sub-word units is used, since it would require a lot of training 
data to adequately train an HMM for larger units. An HMM state corresponds to an acoustic 
reference. Various techniques are known for modeling a reference, including discrete or 
continuous probability densities. Each sequence of acoustic references which relate to one 
specific utterance is also referred as an acoustic transcription of the utterance. It will be 
appreciated that if other recognition techniques than HMMs are used, details of the acoustic 
transcription will be different. 

A word level matching system 130 of The figure matches the observation 
vectors against all sequences of speech recognition units and provides the likelihoods of a 
match between the vector and a sequence. If sub-word units are used, constraints can be placed 
on the matching by using the lexicon 134 to limit the possible sequence of sub-word units to 
sequences in the lexicon 134. This reduces the outcome to possible sequences of words. 

Furthermore a sentence level matching system 140 may be used which, based 
on a language model (LM), places further constraints on the matching so that the paths 
investigated are those corresponding to word sequences which are proper sequences as 
specified by the language model. As such the language model provides the second term P(W) 
of equation (a). Combining the results of the acoustic model with those of the language model, 
results in an outcome of the unit matching subsystem 120 which is a recognized sentence (RS) 
152. The language model used in pattern recognition may include syntactical and/or 
semantical constraints 142 of the language and the recognition task. A language model based 
on syntactical constraints is usually referred to as a grammar 144. The grammar 144 used by 
the language model provides the probability of a word sequence W = wiw 2 w 3 ...w q , which in 
principle is given by: 

P(W) = P(wi)P(w 2 |wi).P(w3|wiw 2 )...P(w q | w 1 w 2 w 3 ...w q ). 
Since in practice it is infeasible to reliably estimate the conditional word probabilities for all 
words and all sequence lengths in a given language, N-gram word models are widely used. In 
an N-gram model, the term P(wj| wlw2w3...wj-l) is approximated by P(wj| wj-N+l...wj-l). In 
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practice, bigrams or tri grams are used. In a trigram, the term P(wj| wl w2w3...wj-l) is 
approximated by P(wj| wj-2wj-l). 

The speech processing system according to the invention may be implemented 
using conventional hardware. For instance, a speech recognition system may be implemented 
on a computer, such as a PC, where the speech input is received via a microphone and 
digitized by a conventional audio interface card. All additional processing takes place in the 
form of software procedures executed by the CPU. In particular, the speech may be received 
via a telephone connection, e.g. using a conventional modem in the computer. The speech 
processing may also be performed using dedicated hardware, e.g. built around a DSP. 

The noise elimination according to the invention may be performed in a pre- 
processing step before the spectral analysis subsystem 100. Preferably, the noise elimination is 
integrated in the spectral analysis subsystem 100, for instance to avoid that several 
conversions from the time domain to the spectral domain and vice versa are required. All 
hardware and processing capabilities for performing the invention are normally present in a 
speech recognition or speech coding system. The noise elimination technique according to the 
invention is normally executed on a processor, such as a DSP or microprocessor of a personal 
computer, under control of a suitable program. Programming the elementary functions of the 
noise elimination technique, such as performing a conversion from the time domain to the 
spectral domain, falls well within the range of a skilled person. 

Detailed description of the invention 

Details are given for speech signals. Other signals can be processed in a 
corresponding way. As described above, in the discrete-time domain noise speech y can be 
represented as: 

y(i) = s(i) + n(i), 0 <i<T-L (1) 

where s, n, y denote clean speech, noise and noisy speech respectively, and where T denotes 
the length of the speech and / is the time index. Using conventional techniques, such as a Fast 
Fourier transform, the speech signal y can be transformed into a set of spectral components 
Y(k). It will be appreciated that if already a suitable conversion to the time domain had taken 
place, it is sufficient to retrieve the spectral components resulting from such a conversion. 

Let \S(k)\ y \N{k)\ t and \Y( k )\ be the corresponding magnitude of the spectrums 
of the time-domain signals s, n, and y, respectively. Using the conventional spectral 
subtraction techniques, individual spectral components are forced to be positive. It does not 
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allow the situation wherein an individual spectral component Y(k) of the noisy speech y is less 
than the corresponding spectral component N(k) of the noise signal n. 

The following correlation is assumed to exist between the speech signal and the 

noise signal: 

\Y(k)\ a =|S(*)| a +|*(*)| a +Y sn \S(k)\\N(k)\ (6) 
where Y S n denotes the correlation coefficient of speech and noise in the spectral domain and a 
could be 1 or 2 for magnitude and power spectrum, respectively. Using this correlation as the 
basis for estimating the clean speech spectrum (and as such using a correlated spectral 

subtraction) makes it possible to have the situation wherein |*X*)| a < \N (*)| a if Y m < 0 • 

Let and |tf (*)|be the estimates of the magnitude spectrums of the clean 

speech signal s and the noise signal n, respectively. Preferably, is estimated from pause 

(non-speech segment). Based on equation (6), |s(*)| can be calculated by solving the equation 

in one step or by using an iterative algorithm. The one-step solution are give in the following 
equations (7) and (8) for the cases wherein a=l or a=2, respectively: 

M-(i + r m K*)|)' lffl=1 () 



, , -rs n \Hk}±Jrl\N(kf + 4\Y(kf-\N(kf) 

M 1 5—^ 1 



2 ,ifa=2 (8) 

Equation (8) has two possible solutions. The positive solution which is greater than 

(\Y(k] 2 -\N(kf) or close to (\Y{kf -\N(kf) will be chosen since the direction of NSR decrement 

is preferred. 

A preferred iterative algorithm for estimating with specified correlation 
coefficient, , is as follows: 
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15 



LOOP k ( 0 : N-l ) 

Initialization: |s (0) (*)f =\Y(k)\" -\Hkjf (9) 
LOOP I 

\S'(kf =\Y(k)\" -|/V(*)|" -2 Ym \s^(kp(k)\ (10) 
|5^)( fc )| fl =l ' ' M (ID 

|i« +l >(*)| a -\s«Hk)\ 



IF 



|s«>(*)|' 

ELSE t = e + i 
END LOOP t 



< Threshold THEN STOP 



10 END LOOP k 



The outer loop k deals with all individual spectral components. The inner loop 
is performed until the iteration has converged (no significant change occurs anymore in the 
estimated speech). 



The above described algorithm can be used for a fixed correlation coefficient 
y sn . In a further embodiment according to the invention, the correlation coefficient y s „ is 
estimated based on the actual input signal y. To this end, the function of negative spectrum 
ratio (NSR) for the correlated spectral subtraction algorithm according to the invention is 
20 defined as follows: 

NSR = ± J^fdpTikf - \N{kf - r OT |i(*)[Ar(*)|j (12) 

The f NS function shown in equation (5) is a zero-one function. In order to 
derive the relation between the correlation coefficient y sn and NSR, a smoothed zero-one, 
sigmoid function family is preferably used. For example, the following function is 
25 advantageously used for further derivation due to its differentiability. 

^W = 1 + exp (_ a .^ + /?) (13) 
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Exemplary values for a and 0 are 1.0 and 0.0, respectively. 
Then, the expected negative spectrum ratio R is defined as follows: 



*=0 



(14) 



By applying the theory of adaptive learning algorithm, the correlation coefficient is preferably 
obtained by the following gradient operation: 

The correlation coefficient can be learned along the direction of decrease in NSR. This implies 
to reduce the residual noise in the estimated spectrum using the proposed correlated spectral 
subtraction (CSS) algorithm. 

The algorithm of estimating |s(Jt)| with a minimum NSR based correlation 

coefficient y S n is as follows: 



Initialization: m = 0 

= non-zero, initial guess. 

15 LOOP m 



20 



25 



TOOP k ( 0 : N-l y. 
£ = 0 



|s«>(*f =|r(*)|°-p(*f 

loop e 

\s'(kf =\Y(kf -r£>\sw( k ^w)\ 

. , a \S\k)\ a +\sW{ki 

|i(^i)(i)| a _|5(O w | 



If 



< Threshold 1 THEN STOP 



;s«>(*)| 
ELSE £=£+1 



END LOOP ( 
END LOOP k 



WO 01/31640 1 2 PCT/EP00/10713 



if 

END LOOP m 



(m+l) _ v (n») 



v (">> 
/ in 



< Threshold 2 THEN STOP 



The block indicated as block 1 is the same as used for the iterative algorithm assuming a fixed 
correlation coefficient y sn . Instead of using the iterative solution in block one, also the one-step 
solution of equations (7) or (8) may be used. 

It will be appreciated that after the noise has been eliminated as described 
above, the resulting estimated spectral components of the noise-eliminated signal may be 
converted back to the time-domain. Where possible the spectral components may be used 
directly for the subsequent further processing, like coding or automatically recognizing the 
signal. 
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1. A method for reducing noise in a noisy time-varying input signal y, such as a 

speech signal; the method including: 

receiving the noisy time-varying input signal y; 

deriving from the input signal y a plurality of spectral component signals 
representing respective magnitudes \ Y(k)\ of spectral components of the input signal y; 

obtaining a correlation coefficient y sn indicative of a correlation in the spectral 
domain between a clean speech signal component s and a noise signal component n present in 
the input signal y (y = s + n); and 

estimating magnitudes of respective noise-suppressed spectral components 

S(k) by solving a correlation equation giving a relationship between the magnitudes of the 
respective spectral components \Y(k)\ of the noisy input signal y, the spectral components 
|S(Jt;| of the clean speech signal s, and the spectral components \N(k)\ of the noise signal tz, 
where the equation includes the correlation based on the obtained correlation coefficient y sn . 

2. The method as claimed in claim 1, wherein the correlation coefficient y sn is 
predetermined. 

3. The method as claimed in claim 1, wherein the step of obtaining the correlation 
coefficient y sn includes estimating the correlation coefficient y sn . 

4. The method as claimed in claim 3, wherein the step of estimating the 
correlation coefficient y sn includes determining a minimum negative spectrum ratio. 

5. The method as claimed in claim 4, wherein the negative spectrum ratio NSR 
represents a proportion of spectral components S (k) which would be negative based on the 
solution of the correlation equation. 

6. The method as claimed in claim 5, wherein the method includes: 
initializing the correlation coefficient y sn with a non-zero value; and 
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iteratively: 

performing the step of solving the correlation equation to obtain |s(fc)| ; 

and 

estimating a new correlation coefficient based on a gradient decent of 
the negative spectrum ratio NSR forS(£) . 

7. The method as claimed in claim 1, wherein the step of solving the correlation 
equation includes iteratively estimating the noise-suppressed spectrum S(k) . 

8. The method as claimed in claim 7, wherein method includes calculating an 
initial estimate of a magnitude of the noise-suppressed spectrum S i0) (k) by subtracting a 
magnitude of an estimate of the respective spectral components N(k) of the noise signal n 
from a magnitude of the respective spectral components Y(k) of the noisy input signal y. 

9. The method as claimed in claim 7, wherein the step of performing the iterative 
spectrum estimation includes in each iteration: 

estimating a magnitude of an auxiliary noise-suppressed spectrum based on the 
correlation equation where a term with the correlation coefficient y sn is based on a current 
estimate of a magnitude of the noise-suppressed spectrum S 0) (k) ; and 

estimating a new magnitude of the noise-suppressed spectrum S (M) (k) based the 
estimated magnitude of the auxiliary noise-suppressed spectrum and on the current estimate of 
a magnitude of the noise-suppressed spectrum S 0) (k) . 

10. An apparatus for reducing noise in a noisy time-varying input signal y, such as 
a speech signal; the apparatus including: 

an input for receiving the noisy time-varying input signal y; 

means for deriving from the input signal y a plurality of spectral component 
signals representing respective magnitudes \Y(k)\ of spectral components of the input signal y; 

means for obtaining a correlation coefficient y sn indicative of a correlation in the 
spectral domain between a clean speech signal component s and a noise signal component n 
present in the input signal y (y = s + n); and 
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means for estimating magnitudes of respective noise-suppressed spectral 
components S(k) by solving a correlation equation giving a relationship between the 
magnitudes of the respective spectral components \Y(k)\ of the noisy input signal y y the 
spectral components \S(k)\ of the clean speech signal s, and the spectral components \N(k)\ of 
the noise signal n, where the equation includes the correlation based on the obtained 
correlation coefficient Yin- 
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